Maximum likelihood analysis of the first 
KamLAND results 



A. Ianni c 



a) INFN - Laboratori Nazionali del Gran Sasso, 
S.S. 17bis Km 18+910, 1-67010 Assergi (Aquila), ITALY 

Abstract 

A maximum likelihood approach has been used to analize the first 
results from KamLAND emphasizing the application of this method 
for low statistics samples. The goodness of fit has been determined 
exploiting a simple Monte Carlo approach in order to test two different 
null hytpotheses. It turns out that with the present statistics the 
neutrino oscillation hypothesis has a significance of about 90% (the 
best-fit for the oscillation parameters from KamLAND are found to 
be: bm\ 2 ~ 7.1 x 1CT 5 eV 2 and sin 2 6 12 = 0.424/0.576), while the no- 
oscillation hypothesis of about 50%. Through the likelihood ratio the 
hypothesis of no disappearence is rejected at about 99.9% C.L. with 
the present data from the positron spectrum. A comparison with other 
analyses is presented. 



1 Introduction 

The purpose of this paper is to perform a likelihood analysis of the first 
KamLAND results [1]. KamLAND is a reactor-based neutrino oscillation 
experiment with a baseline (source-detector distance) larger than 100 km [2]. 
This allows KamLAND to explore with a terrestrial anti-neutrino beam part 
of the oscillation parameters space which is of interest for solar neutrinos. In 
particular, KamLAND can test the Large Mixing Angle (LMA) solution for 
the solar neutrinos puzzle [3]. 
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Experimental measurements [4, 7, 6, 8, 9] of the solar neutrinos flux on 
Earth seems to show that these particles undergo matter-enhanced flavor 
transformations (MSW effect) [10, 3]. However, a global analysis which takes 
into account only the solar neutrinos measurements cannot identify a unique 
solution for the parameters which drive the oscillations [11, 12]. Including the 
new results from KamLAND the scenario changes and only one possibility, 
namely the mentioned LMA solution, survives [11, 12, 13, 14, 15, 16, 17, 18]. 

In the following we present a new analysis of the KamLAND data. This 
analysis, although performed with the first results shows few features such as 
the importance of the systematic uncertainties which may affect future data 
treatments. 

In Fig. 1 we show the kamLAND results as from [1]. It can be noticed 
that the statistics is rather poor at the moment and 5 bins have zero entries. 
Therefore, a least square analysis of the data could be not appropriate. In 
this for low counting experiments, the method of analysis commonly 

used is that of the Maximum Likelihood (ML). So, in this paper we attempt 
to perform a ML analysis. In order to define our likelihood function we 
introduce some definitions. We call, for the experimental KamLAND results, 
and N° bs the total number of observations and the number of entries in 
the bin ith, respectively. Moreover, we assume that is a Poisson random 
variable. Hence, the joint p.d.f. for the set of data shown in Fig. 1 is the 
product of Poisson distributions [19]. 

N ( ]\Tth( v \\N ob3 

lm = n 1 i LL\ Ex p(- Nt i h w) (i) 
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where Nf 1 is the expected number of entries in the bin ith and v is the vector 
of unkown parameters we wish to estimate through the ML method. Accord- 
ing to the ML prescription in order to estimate the unknown parameters we 
should maximize the likelihood function. This is usually done through the 
log-likelihood function which in the large-N limit 1 (i.e. with high statistics) 
is parabolic. For the log-likelihood we write 

lnL(v) = - f f(x; v)dx + f] N° hs ln{ f <+ ' f(x; v)dx) (2) 

Ja i=1 Jxi 

1 Here, we call N the data sample size. 
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with f(x; v) being the function which describes the physical process under 
investigation 2 and, a and b define the region of interest for the random 
variable x, which is measured. 

Once the parameters have been estimated using the log-likelihood the 
difficult task is to determine the confidence intervals for the same parameters. 
In the large-N limit this is done through the covariance matrix using the 
second derivatives of the log-likelihood function or through the relation [19, 
20] 

lnL(v) = lnL max - — (3) 

where Q defines the condifence region as a function of the number of param- 
eters. Values of Q are tabulated in [19] as an example. Eq. (3) can also be 
used when the likelihood function is not Gaussian, i.e. in the small-N limit. 
In this case, however, the classical definition of confidence region is only ap- 
proximated by eq. (3). Depending on how accurately the uncertainties should 
be reported one could try to estimate the level of the approximation by a 
Monte Carlo [19]. For a multidimensional likelihood an alternative approach 
for confidence regions estimation is to maximize the log-likelihood function 
with respect to all parameters but one. The profile of lnL max against this 
latter can be used as a one dimensional problem and, as an example, Q=l 
in eq. (3) will corresponds to a 68.3% confidence interval. 

Finally, once the parameters and their uncertainties have been estimated, 
a goodness-of-fit calculation can be carried out through the classical Pearson's 
Xper test, where 

2 "(Nf-N?) 2 ... 

XPer = E N th ■ W 

i=l iV « 

Eq. (4) follows a x 2 distribution only in the large-N limit and the rule of 
thumb is that the number of entries in the experimental histogram should be 
such that N° bs > 5 [19, 21]. When the large-N limit is not reached a Monte 
Carlo study to determine the statistics of Xp er should be performed. Only 
in this way a correct P-value can be calculated. 

2 Here, Nf h = £ +1 f(x;v)dx. 
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2 Analysis of the first KamLAND results 



Recently, KamLAND results have been analized [13, 11, 12, 14, 15, 16, 17, 18] 
and used to estimate the solar neutrino oscillation parameters performing 
a global analysis including solar neutrinos and short-baseline experiments. 
In particular, in [13, 11, 12, 14, 15] KamLAND results have been analized 
cosidering the statistics: x 2 p — ~ 2/ogAp 3 , where \ P is a normalized likelihood 
function given by the product of Poisson distributions [19]. So, \ P and the 
likelihood used in this paper only differ by a factor and should have the 
same best-fit parameters. Yet, the confidence region estimation requires a 
Monte Carlo calculation when the statistics is poor because only in the large- 
N limit x 2 p follows a x 2 distribution with N — p d.o.f., being p the number 
of estimated parameters [19, 21]. In [16, 17, 18] a different approach for 
the statistical analysis has been used. Here, a multidimensional x 2 -f unc tion 
is used with the covariance matrix calculated from the experimental errors 
shown in Fig. 1 and the systematic uncertainty from [1]. Of course, the use 
of a x 2 -function implies the assumption of Gaussian errors. As pointed out 
in Sec. 1 this could be not appropriate for the data set of KamLAND. 

In the following we have used the above considerations and the ML 
method to analize the KamLAND results and compare the findings with 
the other methods already implemented. 

For KamLAND and in a 2v scenario we write 



where E and E' are the measured and real visible energy, respectively, and 
E Pe = E' + 0.8 MeV the energy of the incoming v e . Moreover, A is a normal- 
ization factor which accounts for the number of target protons, the detection 
efficiency, the data taking time and conversion of units, cr(E') is the cross- 
section for the inverse /3-decay [22], P is the thermal power (in units of 
GW) of the ith reactor and di its distance from the KamLAND detector [2], 
4>{E') 4 is the PeS flux at the detector, weighed over the different fuel com- 

3 The implementation of this x 2 is suggested by the Review of Particle Properties [21]. 
4 This flux takes into account the differential spectrum of the zVs, the average energy 
and the intensity fraction of each fuel component. 
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ponents [2, 23] in U and Pu as from [1] and calculated using the differential 
spectrum from [24]. In eq. (5), R(E,E') is a Gaussian energy resolution 



function with a = 0.075^/ E'(MeV) [I]. 

In order to determine the constant A, we have normalized eq. (5) by using 
the information and the average expected number of events (86.8) from [1]. 
Yet, we introduce a correction factor, a, which accounts for the systematic 
uncertainties. The z/ e 's survival probability, P? e , is written 

In particular, to take into account the sistematic uncertainties we have mul- 
tiplied the likelihood function in eq. (1) by a Gaussian given as a function 
of a with mean value equal to one and a sys = 0.064, the total systematic 
uncertainty quoted in [1]. So, the log-likelihood function to maximize looks 
like 

r8.125MeV 

lnL(a, 5m 2 , sin 9 12 ) = - / dEN th (E,a, 5m 2 , sin 12 ) + 

J2.6MeV 

+ TN° bs ln [ Ei+1 dE N th (E, a, 8m 2 , sin 12 ) - -(?—±) 2 (7) 
~{ JEi 2 v a sys ' 

with N th from eq. (5). 

We have searched for maxima of InL from eq. (7) using two hypotheses. 
The first assumes no-oscillation and the only non-zero parameter is a. In this 
case the best-fit is for a = 0.892. We point out that without the systematic 
uncertainty the ML method gives a ~ 0.6, which corresponds to the ratio 
between the measured and expected number of events [1]. The number of 
events in the absence of oscillations is 77.4^5 with la confidence interval 
according to eq. (3). This integrated rate is in agreement within la with the 
number used for the absolute normalization: 86.8±5.6 expected events as 
from [1]. The second hypothesis assumes oscillations according to eq. (6). In 
this case we have found several local maxima and two global ones symmetric 
with respect to sin 2 #12 = 0.5. The best-fit points are: (a, sin 2 #12, 5m 2 ) = 
(0.997, q;424,7.11), where 5m 2 is given in units of 10 5 eV 2 . To determine 
the confidence regions for the parameters we have studied the profile of InL 
against one parameter while maximizing with respect to the others. The 
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result of this study around the global maxima is shown in Fig. 2. It can be 
noticed, in particular looking at the profiles of a, that the log-likelihood is not 
parabolic. So, though small there is a deviation from the large-N behavior of 
the data under investigation. The 68.3% confidence intervals, from eq. (3), 
are reported in Tab. 1. 



Table 1: Confidence intervals at 68.3%. Best- fit values are (0.997, 
7.11XKT 5 eV 2 ). 



Model 


a 


sin 2 6>i2 


5m 2 
(lO" 5 eV 2 ) 


No-oscil. 


[0.836,0.949] 






oscil. 


[0.936,1.057] 


[0.253,0.747] 


[6.72,7.67] 



In order to study the goodness-of-fit for the two hypotheses under con- 
sideration we have calculated the distribution of Xp e r by generating Poisson 
values for N° 6s based on the mean value for Nf 1 according to the fit performed 
with the ML method. For the no-oscillation hypothesis we show in Fig. 3 the 
KamLAND data against the ML fit. On the up-right corner we also show 
the distribution of Xp e r together with that of a x 2 P-d.f. with 12 d.o.f. The 
darkened area show the fraction of the distribution above the measured value 
for xper- I 11 this case the P- value is 53% (~48% using the x 2 distribution). 
Following the same reasoning in Fig. 4 we show the ML fit assuming oscilla- 
tions. Again in the up-right corner we report the x 2 p er statistics. The g.o.f is 
93% (~90% using the x 2 distribution). For completeness in Fig. 1 we show 
the best-fit curve assuming oscillations together with the distribution of the 
expected events in the standard scenario (with a — 0). Finally, in order to 
compare our analysis with the others on the KamLAND results, in Fig. 5 we 
show the 90%, 95% and 99% confidence regions for the oscillation parameters 
(from eq. (2)) together with the the profile of lnL(log(8m 2 ), sin 2 6 12 ). This 
latter has been determined by maximizing with respect to a. 
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3 Conclusions 



In conclusion we have analized the first results of KamLAND through the ML 
method for a no-oscillation hypothesis and an oscillation one. The former 
gives a P-value of about 93% and the latter of about 53%. By chance we get a 
result in perfect agreement with what reported in [1] about how the positron 
spectrum is consistent with a neutrino oscillations assumption although we 
have a slightly different oscillation parameters. Furthermore, we also are in 
perfect agreement with the conclusion in [1] about a no-oscillation assump- 
tion for the spectrum analysis. However, in the paper this finding is clearly 
due to the present statistics and the systematic uncertainty. On the contrary, 
the method of analysis in [1] is not well explained. We should also remind 
that the strongest evidence in favour of neutrino oscillations comes from the 
ratio between the measured and expected number of events [1]. In this paper 
we have used the information from the positron spectrum and in order to 
properly make a comparison between the two hypotheses discussed (oscil- 
lation and no-oscillation) we have worked out the ratio A = L™ m ° sc / L™l x 
(likelihood ratio) which turns out to be 8.45 x 10" 4 . This small value gives 
an indication that the observed positron spectrum in KamLAND rules out 
the no disappearance scenario with the present statistics. Computing -2ln\, 
which follows a x 2 distribution with 2 d.o.f. in this case, it turns out that 
the assumption which restrict the number of parameters, i.e. the no disap- 
pearence hypothesis, is rejected at the level of about 99.9%. 

For the sake of completeness we have reduced the systematic uncertainty 
at 2% to study the trend of the fit. As shown in Fig. 3 the fit gets worse for 
the no-oscillation hypothesis and the new P-value is equal to about 30%. 

We have also shown that a x 2 test, in this we could call quasi low-count 
rate experimental scenario, gives within few %'s a result in agreement with 
that reported using a Monte Carlo calculation for Xp er - 

A limiting point of the analysis presented and common with others on 
the same argument is the treatment of the systematic uncertainty which 
are combined in two main sources in [11] and in one correction factor to 
the overall normalization in [13, 12, 14, 15] and here. Moreover, no matter 
effects are taken into account here. Yet, this correction gives only a small 
contribution [12, 23]. 
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Figure 1: KamLAND results (dots). Solid line (hystogram): expected number of events 
according to an average normalization of 86.8 events and no oscillations. Dashed line: 
expected distribution with oscillations and using the ML fit. Solid line: best-fit ML with 
oscillations. 
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Figure 2: Profiles of lnL max against the parameters. Up-right: no-oscillation hypothesis. 
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Figure 3: Best fit with the no-oscillation hypothesis. Up- left: Xp e rs statistics. The 
dashed line corresponds to a total systematic uncertainty reduced at 2%. See text for 
details. 
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Figure 5: Maximum likelihood confidence regions at 90%, 95% and 99% (upper plot) for 
two- flavor active neutrino oscillations at KamLAND. The best-fit points are indicated by 
black dots (upper plot). Profile of In L (lower plot) found by maximizing with respect to 
a (see text for details). 
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